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Accelerated algorithms for simulating the morphological evolution of strained heteroeptiaxy based 
on a ball and spring lattice model in three dimensions are explained. We derive exact Green's func- 
tion formalisms for boundary values in the associated lattice elasticity problems. The computational 
efficiency is further enhanced by using a superparticle surface coarsening approximation. Atomic 
hoppings simulating surface diffusion are sampled using a multi-step acceptance-rejection algorithm. 
It utilizes quick estimates of the atomic elastic energies from extensively tabulated values modulated 
by the local strain. A parameter controls the compromise between accuracy and efficiency of the 
acceptance-rejection algorithm. 

PACS numbers: 



Epitaxial growth techniques enable deposition of a 
dislocation-free thin film on a substrate of a differ- 
ent material with a mismatched lattice constant. For 
film-substrate combinations such as Ge/Si, InAs/GaAs 
and InAs/InP, arrays of three-dimensional (3D) coher- 
ent islands self-assemble spontaneously beyond certain 
film thicknesses under appropriate growth conditions 
P; S H; [3- These studies are of much current inter- 
est since they arc expected to find applications in future 
microelectronic devices. 

One of the most widely studied examples is Ge/Si(100) 
with a 4% lattice misfit. Relatively flat islands called 
prc-pyramids start to emerge at 3 monolayers (MLs) of 
Ge coverage [1, H, 01- Upon further deposition, they 
quickly grow into truncated pyramids bounded by four 
(105) facet planes on the sides and subsequently into 
fully grown pyramids which are also called hut islands. 
Upon still further deposition, they become dome islands 
bounded mainly by (113) facet planes. Finally, large dis- 
located islands appear. For the closely related alloy vari- 
ant Sii_a;Gea;/Si(100) with a generally lower 4a;% misfit, 
the development is rather similar and goes through stages 
characterized by ripples, hut islands, dome islands and 
finally dislocated islands d, H, The structures are 
however larger and each transition is postponed to oc- 
cur at a larger film thickness. The islands are also more 
closely packed. 

Islands self-assemble because they can relieve the elas- 
tic stress in the hetcroepitaxial films. There is theory for 
island formation that emphasizes nucleation. It suggests 
that islands must overcome an energy barrier associated 
with a critical size so that the elastic energy gained can 
balance the extra surface energy cost The theory is 
reasonably consistent with experiments at relatively high 
misfit. At low misfit, the critical island size and hence 
the energy barrier are expected to increase and make nu- 
cleation prohibitively difficult. Island formation mech- 
anisms which do not have a barrier then offer a more 



promising explanation. For example, according to the 
Asaro-TiUcr-Grinfcld (ATG) theory strained 
surfaces are unstable at sufficiently long wavelengths. 
Therefore, shallow ripples first develop from small ran- 
dom initial perturbations and then into islands. For this 
mechanism to operate the temperature must be above 
the surface roughening transition point. This seems to 
be the case for the (100) plane of a Sii-^rGe^, which is 
not a true facet at the experimentally relevant tempera- 
tures [H,!!!. 

Direct simulation of the growth of a strained film is 
much more challenging computationally than for the un- 
strained case because of the long-range nature of elas- 
tic interactions. Early kinetic Monte Carlo simulations 
in two dimensions (2D) based on ball and spring lat- 
tice models by Orr et. al Barabasi [l^, and Khor 
and Das Sarma [l^ have successfully observed island 
formation on strained films at sufficiently high misfits. 
The mechanism is consistent with the nucleation pic- 
ture. These simulations focus on the high misfit regime 
because islands can then be small enough to be compu- 
tationally manageable. More recently, much enhanced 
computational efficiency has been achieved using an ex- 
act Green's function method for the elasticity problem 
of surface atoms. Using also a superparticle coarsening 
approximation and a hopping acceptance-rejection sam- 
pling method, a much wider range of morphologies for 
both high and low temperature regimes characterized re- 
spectively by instability and nucleation roughening mech- 
anisms in both 2D [lo, HH and 3D [l^, have been 
explored. Another advanced approach for the elasticity 
problem based on the fast Fourier transform and a multi- 
grid method applied respectively to the substrate and 
film has also been derived [U, llHl- All of these stud- 
ies work for general film morphologies. Alternatively, 
one can limit the study to the evolution of only shallow 
structures and solve the elastic problem using the half- 
space Green's function in the small-slope approximation 
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Other computational approaches offer different com- 
promises between accurate representation of the physics 
and computational efficiency. For instance, continuum 
simulations using driven diffusion equations with the 
elastic energy term obtained using finite element or simi- 
lar methods are computationally less intensive. The ATG 
instability is readily demonstrated [1^ and effects of sur- 
face anisotropy [l^ as well as island coarsening [s^ can 
be studied in 3D. The importance of alloy segregation 
[Slj and formation of more complicated structures 32 1 
have also been studied recently. Nevertheless, fluctua- 
tions in the atomic scale and lattice discreteness are not 
accounted for and hence island nucleation and growth at 
sub-roughening transition temperatures cannot be stud- 
ied. Also, off-lattice models are more realistic and island 
formation can be simulated at moderate length scales in 
2D [3^. Moreover, molecular models using realistic semi- 
conductor atomic potentials have provided better under- 
standing of island stress distribution [s^l and facet struc- 
tures [35| in static calculations. Finally, first principles 
calculations focus on the statics of fewer atoms but are 
able to provide important estimates for the surface ener- 
gies of relevant facets bounding the islands [H, [13, IH] . 

Thus kinetic Monte Carlo methods based on lattice 
models arc unique in allowing large scale dynamic simu- 
lations for studying properties for which atomic discrete- 
ness is important while fine details of the atomic potential 
and surface energies are of limited qualitative impacts. 
In the following, we will explain in detail the algorithms 
used in Ref. [24I which has enabled some of the most 
efficient lattice based kinetic simulations in 3D reported 
in the literature. 



I. MODEL 

The model parameters of our 3D ball and spring 
model for strained heteroepitaxy are appropriate to the 
Sii_a;Gea;/Si(001) system. It is based on a cubic lat- 
tice with a substrate lattice constant — 2. 71 5 A so 
that gives the correct atomic volume in crystalline 
silicon. The lattice constant a/ of the film material is 
related to the lattice misfit e = {af — as)/af which has a 
compositional dependence e = 0.04a;. Nearest and next 
nearest neighboring atoms are directly connected by lin- 
ear elastic springs with force constants ki = 2eV/a'^ 
and k2 = ki respectively. The shear moduli in (100) 
and (110) directions are given by Gioo = ^2/0^ and 
Giio = (fci + fc2)/2as. They arc equal for our choice 
of fci and ^2 and this leads to better isotropy of our sys- 
tem. The elastic couplings of adatoms with the rest of the 
system are weak and are completely neglected for better 
computational efficiency. Solid-on-solid conditions and 
atomic steps limited to at most one atom high are as- 
sumed. Every topmost atom in the film can hop to a 
different random topmost site within a neighborhood of 
I X I columns with equal probability. Wc put I — 17. 



Decreasing the hopping range does not alter our results 
significantly. The hopping rate of a topmost atom m 
follows an Arrhenius form: 



Rq exp 



nimjl + n2mj2 - ^Ejn - Eq 

knT 



(1) 



Here, nim and are the numbers of nearest and next 
nearest neighbors of atom m. We take 71 = 0.085eV^ and 
72 — 71/2. Single-layer terrace edges along (100) and 
(110) directions have energies per unit length given by 
(71 -I- 2^2) /cLs and a/2(7i -|-72)/as which are close to each 
other. The energy AE^ is the difference in the strain 
energy of the whole lattice at mechanical equilibrium 
when the site is occupied versus unoccupied. Wc put 
Eq = 0.415eV and Rq = 2Do/{(jasf with Do = 3.83 x 

lO^^A'^s"^ and = P This gives the appropriate 
adatom diffusion coefficient for silicon (100) [331 . Note 
that the hopping rates defined in Eq. ([T]) following Ref. 
[TtI obey detailed balance in contrast to those adopted in 
Refs. [3, [1^, ll^l . While not necessarily essential in non- 
equilibrium situations considered here, detailed balance 
adds to the reliability of our results. 

The main numerical challenge of our simulation lies on 
the repeated calculations of the elastic energies AEm of 
topmost atoms in order to compute the hopping rates 
Fm from Eq. ([1]). The elasticity problem can be formu- 
lated as follows. First, we note that the strain in a flat 
film is homogeneous . This provides a convenient refer- 
ence position with displacement ui = for every atom i. 
From Hooke's law after applying linearization, the force 
on atom i exerted by a directly connected neighbor j is 



f ij -^ij (^? 



(2) 



lij)K.ijhij arises from the 



where the symmetric matrix Ky — n,jj 

ulus tensor and bij = {l^j 
homogeneous stress in flat films. The spring constant 
kij equals either fci or k2 for tangential or diagonal con- 
nections respectively. The unit column vector hij points 
from the unstrained lattice position of atom j towards 
that of atom i and t denotes transpose. Furthermore, l^j 
and lij are respectively the natural and homogeneously 
strained spring lengths which follow easily from and 
e. Mechanical equilibrium requires J2j fij ~ for each 
atom i. This leads to a large set of equations coupling the 
displacements Ui of all the atoms. Conventionally, this 
large set of e quat ions is solved directly using approxima- 
tion schemes jlTl [l9| . The solution then gives the elastic 
energy stored in every spring and their sum gives the to- 
tal elastic energy E^ of the whole system in mechanical 
equilibrium. Calculating Es twice with and without the 
atom m and comparing the values give AEm- This has 
to be done in principle for every topmost atom m after 
every successful atomic hop involving non-adatoms. Fast 
algorithms are hence indispensable. 

Our Green's function method to be explained in the 
next section is a variant of the boundary integral method; 
it directly calculates the displacement of the surface 
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atoms only. We show here that this is already sufficient 
for calculating the elastic energy Eg of the whole sys- 
tem. First, a surface atom is defined as one which has 
at least one spring missing. There are hence more sur- 
face atoms than topmost atoms. Consider virtual forces 
which adiabatically increase from to bjj' applied to 
every surface atom j where j' is summed over all miss- 
ing atoms which could have been directly connected to j. 
These forces push all atoms from their mechanical equi- 
librium positions to their homogeneously strained lattice 
positions, i.e. a displacement —uj- The virtual work 
done, > 0, is given by 

W^-l/2j2bjr-Uj (3) 
jj' 

It equals the difference — Eg. Here, E^ denotes 
the strain energy of the homogeneously strained lattice 
and can be straightforwardly computed by simple bond 
counting. Therefore, the elastic energy Es of the whole 
system is given by 

Es^E°^+\Y.^,,,-u, (4) 

n' 

which depends only on the positions of the surface atoms 

II. EXACT GREEN'S FUNCTION METHODS 

We now explain in detail the Green's function method 
which reduces the elasticity problem into one involving 
explicit consideration of only the surface atoms. This 
leads to a greatly reduced set of equations and dramat- 
ically reduces the computational burden. It is analo- 
gous to boundary integral methods [4l[ . Either full-space 
or half-space Green's functions defined on an extended 
lattice with regular boundaries can be used. More im- 
portantly, it is exact for arbitrary morphologies. This 
is in sharp contrast to half-space Green's functions in 
the small slope approximation which are often applied 

First, as a mathematical construct, we enlarge the lat- 
tice representing arbitrary morphology by adding ghost 
atoms with similar elastic properties to form an extended 
lattice with regular boundaries. Unphysical couplings are 
hence introduced but can be exactly cancelled. The pre- 
cise method to achieve this is not unique. We will first 
explain a displacement-based Green's function method 
which has been applied in our simulations [13, [2l|, [2^ . 
For completeness, we will also introduce closely re- 
lated force-based and hypersingular displacement-based 
Green's function formalisms. They are not currently 
adopted but have distinct properties which may lead to 
other applications in the future. For all these formalisms, 
we take periodic boundary conditions in lateral directions 
and fixed boundary conditions at the bottom of the sub- 
strate as dictated by the lattice model. For the top layer 



in the extended lattice, we adopt fixed boundary condi- 
tions but free boundary conditions are equally good since 
all their influences will be exactly cancelled anyway. 

A. Displacement-based formalism 

A real surface atom j experiences an unphysical force 
fjjf by a directly connected ghost atom j given by Eq. 
(12). To cancel all unphysical forces by the ghost atoms, 
we apply an exactly opposite external force 

fl = -T. fn' (5) 

j' 

to atom j where the sum is over all ghost atom j' directly 
connected to the real atom j. Noting that the elastic 
properties of the real and ghost atoms are identical, we 
apply Eq. ([2|) and obtain 

fl = J2^^n'i^3-^f)-bn'] (6) 
j' 

This introduces the unknown displacements Uji for the 
ghost atoms. In this formalism, instead of solving for 
their values, we apply another external force fp on every 
ghost surface atom j' which pushes all ghost atoms back 
to the homogeneously strained positions with Uj' = 0. 

The force fp consists of a term — J^j fj'j which exactly 
cancels the spring forces by the real atoms j as well as a 
term bj'j which provides the necessary homogeneous 
stress. Using Eq. ([2|) and Uj' = 0, wc get 

j 

where j is summed over all real surface atoms directly 
connected to j'. Equation ([6]) also reduces to 

if ^ Y.(^n'^i-^n') (8) 

j' 

With the external forces and fp on the real and ghost 
surface atoms respectively, the real lattice is exactly de- 
coupled from the ghost atoms. Therefore, the problem 
defined on the extended lattice with the applied forces is 
identical to the original physical one. It is hence legiti- 
mate to calculate the displacement of any real atom 
i based on the extended lattice being acted on by the 
external forces, i.e. 

u, = ^G,,,/f + ^G,,./;, (9) 

j j' 

where G denotes the lattice Green's function for the 
extended lattice. An important point is that G is in- 
dependent of the film morphology and can be computed 
numerically prior to the start of the simulation. Substi- 
tuting Eqs. ([7]) and ^ into Eq. we arrive at the 
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main equation of the displacement-based Green's func- 
tion approach 

= ^[(G,, - G,,v)K,yir, - G,/,y] (10) 

n' 

where the double sum is over all pairs of directly con- 
nected real and ghost surface atoms j and j' respectively. 
Although Eq. (fTO|) holds for any atom i, restricting i 
to only real surface atoms, Eq. (fTUl) now constitutes a 
greatly reduced set of equations from which the values of 
Ui for all surface atoms are obtained numerically using 
the bi-conjugate gradient method. The elastic energy Eg 
of the whole system then follows immediately from Eq. 

Our simulations do not require calculating the dis- 
placements of the bulk atoms, which can be a time con- 
suming task. Nevertheless, to obtain for a bulk atom i 
for consistency checks or for the purpose of presentation, 
we simply apply Eq. and substitute the displace- 

ments of the surface atoms to its R.H.S. Figure [T] shows 
the calculated displacements of both surface and bulk 
atoms in the cross-section of a film from a small scale 3D 
simulation. Another interesting property of Eq. (jlO[) is 
that the long-range nature of the elastic interactions is re- 
flected in the coupling coefficient (G^j — Gy')Kjj' which 
decays as 1/r^ in 3D at large r where r is the distance 
between atoms i and j. Therefore, the coupling of the 
displacements of the surface atoms is also long-ranged as 
expected and should not be casually truncated. They can 
be efficiently accounted for using a coarsening scheme dis- 
cussed later. This exact Green's function approach has 
been motivated by lattice patching and bond breaking 
considerations. The derivation is particularly simple and 
intuitive. We have since learned that it is a lattice analog 
of the boundary integral method for continuum elasticity 
[4l|. It is also closely related to an earlier result derived 
algebraically using Dyson's equation [i^] . 



B. Force-based formalism 

We now explain a closely related but distinct Green's 
function approach for the clastic problem which has yet 
been applied in our simulations. The displacements for 
the ghost surface atoms Uji in Eq. © are put to zero 
by applying an additional set of external forces in the 
previous discussion. Alternatively, they can be formally 
solved without resorting to new forces. Equation ([6]) is 
first rewritten as 

ft ^Y.'^Ku'{u,-u,,)-bu.] (11) 

i' 

where the sum is over all ghost atoms i' directly con- 
nected to the real atom i. Now, Uii no longer vanishes 
in general. Similar to Eq. the displacements of both 
real and ghost atoms i and i' can be expressed using 
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FIG. 1: Cross-section of a strained film in 3D with a rough 
surface from a small scale simulation. Each arrow represents 
the displacement vector Ui of an atom i from its homoge- 
neously strained lattice position indicated by the dotted lines. 



the only external forces /J now present and the Green's 
function G for the extended lattice as follows 

3 

u, = Y.^^'oIl (12b) 

3 

where the sums are over all real surface atoms j. Substi- 
tuting the displacements into Eq. (|lip gives 

ji' i' 

This is the main set of equations in the force-based for- 
malism. They are to be solved numerically to obtain the 
external forces /f for all real surface atoms i. The dis- 
placement Ui for the real surface atoms and hence the 
elastic energy can then be readily calculated using 
Eqs. (|12ap and ^ respectively. 

The coupling coefficient in Eq. (fT^l) also decays as 
1 /r^ with the distance r as in the previous displacement- 
based case. For the whole procedure in calculating i?s, 
from both operation counting and actual numerical im- 
plementation, the computational efficiency of this force- 
based formalism is similar to the previous displacement- 
based formalism. However, the main variable ff here 
is not a physical quantity and apparently admits no in- 
tuitive coarsening scheme, in contrast to that for the 
displacement-based case to be discussed in Section IHII 
Thus we have not pursued this approach further. We 
are not aware of any similar formalism reported in the 
literature. 
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C. Hyper-singular displacement-based formalism 

By explicitly demanding mechanical equilibrium of 
the real surface atoms, the displacement based formal- 
ism in Sec. Ill Al can be further developed into another 
form an alog ous to the hyper-singular boundary integral 
method [4l| . For a real surface atom i, equilibrium im- 
plies J2k fik = where k is summed over all directly 
connected real neighbors of i, which are not necessarily 
surface atoms. Applying Eq. ([2]), we get 



V'[Kife(Mj - Uk) - hk] = 



fc 



(14) 



We now apply Eq. ([T0|) twice to express both Ui and 
in terms of the displacement of the surface atoms and 
obtain after simple algebra 

Kjfc [(Gy - Gyv) - {Gkj - Gkj')] Kjj'Uj 

jj'k 

= ^ + ^ K,fc (Gy - Gkj)h,,, (15) 

k jj'k 

Solution of this set of equations gives Uj for all surface 
atoms j. However, the coefficients and constants in the 
equations require more floating point operations to com- 
pute and from actual implementation the equations also 
take more bi-conjugate gradient steps to solve. This for- 
malism hence is not adopted. On the other hand, the 
coupling constant in Eq. (fT5|) decays as \/r^ in 3D with 
the distance r between the corresponding particles. It 
would be interesting to explore if the steeper decay can 
lead to a coarsening scheme more efficient than the one to 
be presented next based on the standard displacement- 
based formalism. 



III. SUPERPARTICLE COARSENING SCHEME 



The reduced set of equations (Eq. (fTO|) ) from the exact 
displacement-based Green's function method for obtain- 
ing the displacements of the surface atoms and hence the 
elastic energy greatly speeds up the computation. Yet, 
it can still be substantially improved using superparticle 
coarsening approaches used in Rcfs. 20, 21, 22, 2^1 The 
particular implementation adopted in Refs. [23 . [23j will 
be summarized and more details can be found in Ref. 
[2^. First, note that the sum over directly connected 
pairs of real and ghost surface atoms in Eqs. (fTO|) is 
equivalent to the sums over broken bonds of real surface 
atoms. We hence rewrite Eq. pO|) as 



3P 



AGy^K/3Uj 



(16) 



where the sum is now over all real surface atoms j and 
each bond direction (3. We put S^jp = 1 if the /3th bond 
of atom j is broken and it is otherwise. Also, AGijp ~ 



Gij — Gij' , Kp = Kjji and bp — bjji where the (3th bond 
of atom j is connected directly to atom j' . 

The idea of the superparticle approach is as follows. 
Finding the strain energy Ai5,„ of atom m needed in 
Eq. (U]) requires calculating the strain energy Eg of the 
whole lattice twice with and without atom m. Certain 
fine details of the surface far away arc obviously unim- 
portant and can be neglected. Specifically, surface atoms 
are grouped into sets called superparticles with the Jth 
of them denoted by O/. We neglect fluctuations within a 
superparticle by assuming identical displacement Ui = ui 
for each member i € 51/. Equation (|16p can then be ap- 
proximated by 



Ul 



E 



A 



IJUJ 



B 



1.1 



where 



A/j ^ £,jpl^Gijp'Kp 

jenj,p 

Bij ^ CjpGijbp 

jenj,p 



(17) 

(18a) 
(18b) 



The index / in G/j and AGj jp denotes the lattice point 
closest to the centroid of the superparticle ^Ij. At this 
point, individual constituent particles within the super- 
particles are still referenced explicitly. To further save 
computation time, we approximate A/j and Bij by 



A/j = ^njpAGijpK^ 
Bij = ^njpGijbp 



(19a) 
(19b) 



respectively. The Green's functions are now completely 
indexed by the centroid positions / and J while 



njp 



E^. 



jl3 



(20) 



is the number of broken bonds in superparticle J in direc- 
tion (3. However, Eqs. (|19p provide good approximations 
only when the superparticles / and J are far apart. We 
still have to use the more accurate Eqs. ^TE\\ for su- 
perparticles close to each other. In addition, since the 
problem is always solved in pairs with identical morphol- 
ogy except for one atom m, terms in Eqs. (fT5|) and ([TO]) 
which remain unchanged in the second calculation are 
properly re-used. In contrast to conventional methods, 
the computation time is dominated by the calculations of 
the coefficients and constants in Eq. (|17p using Eqs. (fTSl) 
and (|19p rather than the subsequent iterative solution of 
the resulting system of equations. At a given coarseness, 
the number of superparticles scales up roughly as log L 
for a substrate with a lateral width L for both 2D and 
3D simulations. The computation time which depends 
on the number of coefficients and constants in Eq. pT]) 
thus scales roughly as (logi)^. Further details including 
the grouping of surface atoms into superparticles can be 
found in Ref. [H. 
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IV. ATOMIC HOPPING 
ACCEPTANCE-REJECTION ALGORITHM 

Even with the Green's function method and the su- 
perparticle approximation, the calculations of the elastic 
energies of the potentially hopping atoms are still by far 
the most time consuming parts in the kinetic simulations. 
We adopt a multi-step hopping algorithm aiming at min- 
imizing the number of these calculations. It can utilize 
justifiable approximations to improve the computational 
speed. Yet, by tuning a single parameter A to be defined 
below, it smoothly crosses over and converges back to the 
original exact hopping algorithm. Therefore, the desired 
compromise between accuracy and speed can be easily 
selected to suit a given set of physical conditions. More 
importantly, impacts of any approximations in the hop- 
ping algorithm can be easily accessed by repeating the 
simulations again with better accuracies. 

In the model, each topmost atom is in general allowed 
to hop to another column simulating the surface diffu- 
sion process. The conventional approach is to calculate 
the hopping rate at each column m given by Eq. ([T]) 
and then the hopping atom can be randomly sampled 
according to the associated probabilities. After each suc- 
cessful hopping event, the precise surface configuration 
changes. Due to the long-range nature of the elastic in- 
teractions, AEm and Tm for the whole surface in general 
also change and have to be recomputed. In practice, the 
elastic interactions arc often truncated to a very limited 
range and thus onl y v alues at a small neighborhood re- 
quire updating [T7l.]l9|. 

We adopt an alternative acceptance-rejection scheme 
for efficient sampling of hopping atoms without truncat- 
ing the elastic interactions. First, as will be explained 
later, easily computable upper and lower bounds 17+ and 
ri~ of AEm are available. Equation ([T]) then gives an up- 
per bound 



i?o exp 



kr^T 



(21) 



of the rate F^. All values involved are available and F+ 
for the whole surface can be easily kept up-to-date. They 
are stored in a standard binary tree data structure. At 
each Monte Carlo step, we first sample m using F+ as 
the relative probability efficiently from the binary tree. 
Since the upper bound F+ instead of the true rate Tm is 
used, atom m hence chosen will hop only with an event 
acceptance probability 



Pn 



F^ 



cxp 



(22) 



It appears that we then need to conduct the intensive 
computation of AEm to find pm, but we can do better. 
A lower bound p~ of Pm can easily be obtained from 



cxp 



(23) 



Let ^ be an independent uniform random deviate in [0, 1). 
If ^ < p~ , the hopping event is accepted immediately. An 
explicit calculation of AEm. is avoided and this leads to 
a considerable speed up of the simulation. Otherwise, 
we finally compute AEm as described in the previous 
sections in order to calculate Pm in Eq. (P^ . Using 
the same random number ^, the event is accepted if ^ < 
Pm- Otherwise, it is rejected. It can be easily shown 
that this acceptance-rejection scheme gives the atomic 
hopping rate in Eq. ([T]) noting that the time elapsed in 
a Monte Carlo step is 



At =1/5] F,-! 



(24) 



To calculate the upper and lower bounds required 
above, we use quick estimates flm of the elastic ener- 
gies AEm which will be explained in Sec. |Vl With an 
estimate flm, we put 



(25a) 
(25b) 



where c+ and c~ are dynamically calculated global bi- 
ases. Whenever a calculated value of AEm does not 
lie within the predicted upper bound ri+ by a comfort- 
able safety margin A taken here as A = O.OleV, c+ will 
be increased considerably to attain a more conservative 
bound. Otherwise, it is decreased slightly for a more ag- 
gressive event acceptance rate. The algorithm for is 
analogous. We also note that AEm > so that is 
replaced by if it is negative. For adatoms with elas- 
tic couphngs neglected, ri+ = flm = = 0. In case 
an explicit calculation of AEm is conducted but the hop 
is finally rejected, we put 51+ = fi" ~ AEm until the 
next successful hopping event occurs at a neighborhood 
of m. In particular, for a large safety margin A, the al- 
gorithm reduces into the original exact one in which all 
values of AEm has to be explicitly calculated for all top- 
most atoms after each successful hopping event involv- 
ing a non-adatom. For small A, the algorithm is more 
efficient but under or over sampling of certain hopping 
events may occasionally occur. 

Our acceptance-rejection hopping algorithm is a multi- 
step one with a few rather complex components targeting 
at a good numerical efficiency. It is further complicated 
by additional special rules such as neglecting the elas- 
tic interactions of adatoms and forbidding large atomic 
steps. Reliable software implementation is non-trivial be- 
cause minor coding mistakes affecting only certain sur- 
face configurations often do not lead to disastrous im- 
pacts on the resulting morphology and can be extremely 
difficult to spot. Hence, we devote great efforts to guar- 
antee a reliable implementation. One particularly help- 
ful consistency check is a Boltzmann's distribution test. 
Since our model follows detailed balance, an equilibrium 
surface follows the Boltzmann's distribution. Wc perform 
long test simulations of annealing in small lattices with 
all but two atoms frozen. We make sure that all or most 
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of the not too many possible configurations will be vis- 
ited many times. For each configuration, the combined 
duration should agree with the Maxwell-Boltzmann's dis- 
tribution within the statistical error bar. We repeat the 
test with frozen atoms arranged in a wide range of con- 
figurations to make sure that hopping events in all cir- 
cumstances are simulated with the correct probabilities. 



QUICK ESTIMATES OF ELASTIC 
ENERGIES 



Our hopping algorithms presented in the last section 
requires an easily computable estimate flm for the elas- 
tic energy AE„i of each topmost atom m. We now ex- 
plain our algorithm for generating fl„i. Consider first 
simulations in 2D 2l| . Due to the assumption of linear 
elasticity, AE,n oc e^. We write 



-.1..L) 



(26) 



where $,„ is the elastic energy of an atom m extrapo- 
lated to unit lattice misfit and it depends on the detailed 
surface configuration non-trivially. With long- 

range elastic interactions, AEm and hence depend 
in general on the morphology of the whole surface. We 
split the surface into a local region C centered at m and 
a distant region V which will be treated accurately and 
approximately respectively. Given a specific local surface 
configuration defined in C, AiJ,„ does not only depend 
on e but also significantly on the morphology in V. For 
example, AEm has a much larger magnitude when C is 
situated at a highly stressed valley than at a partially re- 
laxed peak. In fact, what is most relevant is the resulting 
local strain induced by V averaged over C. Let A™ be 
the horizontal component of this coarsened strain. Anal- 
ogous to Eq. (|26p . we hence propose an estimate ftm of 
AEm given by 



(27) 



where the combined effects of the lattice misfit e and the 
morphology in V arc essentially included in A™ . Impacts 
due to microscopic details in 2? as well as other strain 
components are neglected. The elastic energy $ at unit 
strain depends non-trivially on the local surface configu- 
ration. The local configuration is uniquely characterized 
by a set of surface steps while the absolute surface height 
itself is irrelevant. Restricting surface step- heights within 
±2as in the simulations and taking a local region £ of 9 
columns wide, there arc 8 surface steps and the height of 
each of them takes one of five allowed values. For each of 
the 5^ resulting local configurations, 3> is precomputed 
and tabulated before the main simulation commences. 
Specifically, we have used in the precomputation lattices 
of lateral width L = 32. The film thickness at column m 
is Sos well cleared of the substrate. In the distant region 
the thickness equals IGa^ uniformly and this gener- 
ates considerable compression. For every local surface 



configuration, the elastic energy AEm is then explicitly 
calculated using our Green's function approach and $ is 
given by AEm/t^ which is independent of the misfit e 
used in the pre-computation. 

The coarse-grained local strain A™ can be estimated 
easily during the simulation noting that it mainly de- 
pends on structural features in I? at a longer length scale 
and changes relatively slowly. Simply by inverting Eq. 
([27]) and averaging, we obtain 



< AEm > 
<'i'i{hi},ec) > 



(28) 



where < • > denotes averaging over data associated with 
the three most recent explicit calculations of AEm ■ More 
precisely. Am defined here is consistently smaller than 
those proportional to the local strain since $ is precom- 
puted with the local surface located at a valley instead 
of a flat plane. This however leads to no degradation in 
the performance as replacing $ by c$ for all local con- 
figurations for any constant c > leaves our algorithm 
invariant. 




FIG. 2: Film profile in the distant region T) for the precom- 
putation of the strain energy table. The local region £ is 
shaded in black. 

For 3D simulations [13], we generalize Eq. (|?7|) and 
approximate the clastic energy AEm by 



>^lm'^xi{h^hec) + ^Im'^yiih^hec) 



(29) 



where Xj;m and Xym are components of the coarse-grained 
local strains in planar directions. The local region C now 
consists of 13 columns as shown in Fig. [21 Taking into 
account that step- heights are restricted within ±1 in our 
3D simulations. 3^^ possible surface configurations char- 
acterized by a set of 12 step heights [4^ are considered. 
For each configuration, is calculated and tabulated 
using <Pa: = AEm/e'^ which follows from Eq. ^ assum- 
ing that the local region is strained only in the x direction 



with At 



e and A„ 



0. More precisely, the precom- 



putation is based on films on a 32 x 32 x 32 substrate. The 
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frozen morphology in the distant region V is also shown 
in Fig. O It is the simplest morphology which generates 
a significant strain only in the x direction. Then is 
obtained from by symmetry. 

In contrast to the 2D case, X^m and Ay,„ cannot be 
solved directly but are instead obtained from a sim- 
ple least square fit. We define an error measure Sm = 
{AEm — r^m)^- Whenever and explicit calculation of 
AEm is conducted, the local strain components are up- 
dated according to the steepest descent approach as fol- 
lows 



A. 



d£„ 
dXxi 
d£„ 
dXy, 



(30) 
(31) 



where the rate constant rj is taken as 0.5. Furthermore, 
Xxm and Xym are relatively smooth functions of position. 
To suppress statistical errors, after each steepest descent 
step, they are further replaced by weighted averages with 
values at neighboring columns, i.e. 

Xxm < wXxm + (1 - W) < X^i >m (32) 

Aym < wXyrn + (1 - w) < Xyi >,„ (33) 

where w = 0.8 and < ... >,„ denotes averaging over the 
8 neighboring columns of site m. 

VI. RESULTS 

Figure[3]shows an initially flat film from a typical simu- 
lation after annealing for 66/is at a lattice misfit e = 6% 
and temperature T = lOOOTi'. Islands have started to 
form and the roughening mechanism is expected to fol- 
low the Asaro-Tiller-Grinfeld instability as explained in 
Rcf. [Ill- The nominal thickness is 5 monolayers. The 
substrate used has 128 x 128 x 64 lattice sites and this 
is the largest considered in similar kinetic simulations 
reported in the literature. The pre-computed Green's 
functions have been calculated for an extended lattice 
with 128 X 128 x (64 4- 30) sites so that films with a 
maximum local thickness up to 30 layers can be simu- 
lated. Totally, we have carried out 6.5 x 10^ hopping 
attempts (Monte Carlo steps) each of which involves a 
sampling event according to Eq. (j2ip based on the up- 
per bound ri+ of the elastic energy. About 2.2 x 10^ of 
them are successful hops by adatoms with elastic interac- 
tions neglected. Of the remaining 4.3 x 10^ non-adatom 
hopping attempts, 1.8 x 10^, i.e. 42% are accepted di- 
rectly using Eq. ((23|) based on the lower bound fi" . In 
the other 2.5 x 10^ attempts , AE„i is explicitly com- 
puted and a further 0.6 x 10^ attempts, i.e. 13% are 
accepted using Eq. (j^ . Therefore, a combined 55% 
of all non-adatom hopping attempts are accepted while 
the remaining 45% are rejected. Totally, there are hence 
4.6 X 10^ successful hopping events with 2.4 x 10^ of them 
involving non-adatoms. About 1.04 explicit calculations 




20 40 60 80 100 120 




FIG. 3: Surface from a simulation of annealing at misfit 
e = 6% and temperature T = lOOOA' in top view and 3D 
view. The peak to peak roughness is 9 monolayers. 
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FIG. 4: Plot of Q,m against AEm where AEm is the elastic 
energy of a surface atom calculated using the superparticle 
approach and Qm is its quick estimate. 
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of AEm are carried out for each successful non-adatom 
hopping event compared with nearly 128 x 128 calcu- 
lations in principle. Figure H] plots the quick estimates 
flrn against the more accurate values AEm obtained us- 
ing the superparticle approximation. The pairs of values 
are sampled randomly throughout the simulation. For 
the whole simulation, AE„i actually lies outside the pre- 
dicted bounds only with a small chance of less than 0.8%. 
The resulting over- or undcr-samplings of hopping events 
have shown to be negligible in smaller scale simulations. 

The simulations take 18 days to execute on a 2.2GHz 
Core Duo Pentium computer. The repeated calculations 
of AE,n consume about 85% of the CPU time. We use 
110 superparticles. Each calculation takes about 0.5s. 
About 95% of that goes to computing the coefficients 
and constants in setting up Eq. (jlOp . This part of our 
codes have been carefully written to implement both par- 
allel processing with the dual cores and vector process- 
ing with arrays each consisting of four single precision 
floating point numbers provided by the streaming SIMD 
Extensions 2 (SSE2) of the processor. The Intel C-|— I- 
compiler is used. The remaining 5% of the computation 
load is spent on the iterative solution of the equations 
using the bi-conjugate gradient method. It uses Intel's 
mathematical kernel library which also takes advantage 
of the parallel and vector facilities. We use a relatively 
small tolerance of 0.0001% as the convergence condition 
for the iterative solution of the superparticle displace- 
ments. The overall error in the calculation of the elastic 
energy AEm is numerically checked to be within about 



5% which is essentially due to the superparticle approx- 
imation only. The run time of a simulation is expected 
to scale roughly as L^(logL)^ for a substrate with L x L 
surface sites. The factors and (logL)^ respectively 
account for the number of hopping surface particles and 
the computation load for each elastic energy calculation 
as explained in Sec. III. For instance, when we repeat 
our simulation using a smaller substrate with L = 64, it 
takes about 3 days to execute. 



VII. DISCUSSION 

Algorithms for fast Monte Carlo simulation of the mor- 
phological evolution of strain heteroepitaxial thin films 
are presented. A Green's function approach and a super- 
particle surface coarsening scheme enable efficient cal- 
culations of the elastic energies of film atoms. Atomic 
hopping events following rates in the Arrhcnius form are 
selected using an acceptance-rejection algorithm. The 
algorithm utilizes estimates of the elastic energies of top- 
most atoms easily computable from tabulated values for 
similar local surface configurations after taking into ac- 
count local strains. With these algorithms, kinetic Monte 
Carlo simulations have been conducted much more effi- 
ciently than it was possible previously. 

This work was supported by HK RGC, Grant No. 
PolyU-5009/06P. LMS is supported by NSF grant DMS 
0553487. 
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